Search CORE

7 research outputs found

Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers

Author: Dessalk Yared Dejene
Khan Akif Quddus
Matskin Mihhail
Nikolov Nikolay
Payberah Amir
Roman Dumitru
Soylu Ahmet
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Big Data processing, especially with the increasing proliferation of Internet of Things (IoT) technologies and convergence of IoT, edge and cloud computing technologies, involves handling massive and complex data sets on heterogeneous resources and incorporating different tools, frameworks, and processes to help organizations make sense of their data collected from various sources. This set of operations, referred to as Big Data workflows, requires taking advantage of Cloud infrastructures’ elasticity for scalability. In this article, we present the design and prototype implementation of a Big Data workflow approach based on the use of software container technologies, message-oriented middleware (MOM), and a domain-specific language (DSL) to enable highly scalable workflow execution and abstract workflow definition. We demonstrate our system in a use case and a set of experiments that show the practical applicability of the proposed approach for the specification and scalable execution of Big Data workflows. Furthermore, we compare our proposed approach’s scalability with that of Argo Workflows – one of the most prominent tools in the area of Big Data workflows – and provide a qualitative evaluation of the proposed DSL and overall approach with respect to the existing literature.publishedVersio

SINTEF Open

Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers

Author: Dessalk Yared Dejene
Khan Akif Quddus
Matskin Mihhail
Nikolov Nikolay
Payberah Amir
Roman Dumitru
Soylu Ahmet
Publication venue: Elsevier
Publication date: 01/01/2021
Field of study

SINTEF Open

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

NORA - Norwegian Open Research Archives

Big Data Workflows: DSL-based Specification and Software Containers for Scalable Execution

Author: Dejene Dessalk Yared
Publication venue: KTH, Skolan för elektroteknik och datavetenskap (EECS)
Publication date: 01/01/2020
Field of study

Big Data workflows are composed of multiple orchestration steps that perform different data analytics tasks. These tasks process heterogeneous data using various computing and storage resources. Due to the diversity of application domains, involved technologies, and complexity of data sets, the design and implementation of Big Data workflows require the collaboration of domain experts and technical experts. However, existing tools are too technical and cannot easily allow domain experts to participate in the process of defining and executing Big Data workflows. Moreover, the majority of existing tools are designed for specific applications such as bioinformatics, computational chemistry, and genomics. They are also based on specific technology stacks that do not provide flexible means of code reuse and maintenance. This thesis presents the design and implementation of a Big Data workflow solution based on the use of a domain-specific language (DSL) for hiding complex technical details, enabling domain experts to participate in the process definition of workflows. The workflow solution uses a combination of software container technologies and message-oriented middleware (MOM) to enable highly scalable workflow execution. The applicability of the solution is demonstrated by implementing a prototype based on a real-world data workflow. As per performed evaluations, the proposed workflow solution was evaluated to provide efficient workflow definition and scalable execution. Furthermore, the results of a set of experiments were presented, comparing the performance of the proposed approach with Argo Workflows, one of the most promising tools in the area of Big Data workflows.Big Data-arbetsflöden består av flera orkestreringssteg som utför olika dataanalysuppgifter. Dessa uppgifter bearbetar heterogena data med hjälp av olika databehandlings- och lagringsresurser. På grund av stora variationen av tillämpningsområden, den involverade tekniken, och komplexiteten hos datamängderna, kräver utformning och implementering av Big Data-arbetsflöden samarbete mellan domänexperter och tekniska experter. Befintliga verktyg är dock för tekniska och vilket försvårar för domänexperter att delta i processen att definiera och genomföra Big Data-arbetsflöden. Dessutom är majoriteten av befintliga verktyg utformade för specifika tillämpningar, som bioinformatik, beräkningskemi och genomik. Verktygen är också baserade på specifika teknikstackar som inte erbjuder flexibla metoder för att kunna underhålla och återanvända kod. Denna avhandling ämnar att presentera design och implementering av en Big Data-arbetsflödeslösning som utnyttjar ett domänspecifikt språk (DSL) för att dölja komplexa tekniska detaljer, vilket gör det möjligt för domänexperter att delta i processdefinitionen av arbetsflöden. Arbetsflödeslösningen använder en kombination av mjukvaruutrustningsteknik och meddelande-orienterad mellanvara (MOM) för att möjliggöra en mer skalbar körning av arbetsflöden. Tillämpningslösningen demonstreras genom att implementera en prototyp baserad på ett verkligt dataflöde. Efter en granskning av de genomförda testerna modifierades den föreslagna arbetsflödeslösningen för att uppnå en effektiv arbetsflödesdefinition och skalbar körning. Dessutom presenteras resultaten av en uppsättning experiment där man jämför skalbarheten för det föreslagna tillvägagångssättet med Argo Workflows, ett av de mest lovande verktygen inom Big Data-arbetsflöde